The purpose of the MA615 final project is to get a touch of Yelp datasets. Yelp is a restaurant and store rating App which is highly used in the United States. My research goal for this project is to explore and analyze the top 50 common stores in the city of Las Vegas in the Yelp Data. Exploratory data analysis, mapping, text mining, and sentiment analysis were conducted in this project.
The following bar plot shows the top 50 common stores in Las Vegas in the Yelp dataset. The most common store in Las Vegas is Starbucks with 145 stores, the followings are Subway (120 stores), McDonald’s (70 stores) and 7-Eleven (70 stores). I will do analysis both on all 50 stores and every one of the stores.
The following map shows where 145 Starbucks are located in Las Vegas. We can tell that a lot of them are located in Las Vegas Boulevard which is the center of Las Vegas.
The map of all 50 stores and each one of the stores will be shown as the interactive mapping in Shiny Application.
The following bar plot shows the top 50 common words customers use in their review to the stores in Las Vegas. Service, time, food, location, and customer, these words are the most used when people left a review for a store, which is also very reasonable in terms of my common sense.
The following Word Cloud shows another way to visualize the top 50 common words customers use in their reviews.
The following Word Cloud shows the top 50 common words customers use in their review for Starbucks in Las Vegas. The Word Cloud of the top 50 common words customers use in their review for all 50 stores and each one of the stores will be shown as the interactive mapping in Shiny Application.
The Categories variable in the data is to categorize the store into different types. I did a text analysis on the Categories variable to see what types the top 50 common stores in Las Vegas are. The following bar plot shows the top 50 Category types in the Categories variable of stores in Las Vegas. It is not supersized to find out the top two are food and restaurants. Even though Yelp has expanded its service to more than just retreatants, but still, its main focus is restaurants.
The following word cloud shows another way to visualize the top 50 Category types in the Categories variable of stores in Las Vegas.
I used Bing Liu and collaborators lexicons to do the Sentiment Analysis on customer reviews. The Bing lexicon categorizes words in a binary fashion into positive and negative categories. The following plot shows the positive and negative in the words of reviews to every 50 stores. The left side to 0 line is negative, the right side to 0 line is positive. It surprised me that for each store, the frequency of positive and negative words are similar.
The following plot shows the net sentiment (positive-negative) in every 50 stores. We can tell that Capriotti’s Sandwich Shop, Starbucks and The Coffee Bean & Tea Leaf have the most positive review words, where McDonald’s, USPS, Pizza Hut have the most negative words in their reviews.
The following plot shows the top 10 common positive and negative words people use when they left a review in Starbucks. The same plot for each one of them will be shown as the interactive mapping in Shiny Application.
The following comparison word cloud shows the comparison of negative words and positive words people use to review Starbucks.
| Name | Sentiment | Frequency | Ratio |
|---|---|---|---|
| Starbucks | positive | 17257 | 0.5492711 |
| Starbucks | negative | 14161 | 0.4507289 |